A computed tomography urography‐based machine learning model for predicting preoperative pathological grade of upper urinary tract urothelial carcinoma

Abstract Objectives Development and validation of a computed tomography urography (CTU)‐based machine learning (ML) model for prediction of preoperative pathology grade of upper urinary tract urothelial carcinoma (UTUC). Methods A total of 140 patients with UTUC who underwent CTU examination from January 2017 to August 2023 were retrospectively enrolled. Tumor lesions on the unenhanced, medullary, and excretory periods of CTU were used to extract Features, respectively. Feature selection was screened by the Pearson and Spearman correlation analysis, least absolute shrinkage and selection operator algorithm, random forest (RF), support vector machine (SVM), and eXtreme Gradient Boosting (XGBoost). The logistic regression (LR) was used to screen for independent influencing factors of clinical baseline characteristics. Machine learning models based on different feature datasets were constructed and validated using algorithms such as LR, RF, SVM, and XGBoost. By computing the selected features, a radiomics score was generated, and a diverse feature dataset was constructed. Based on the training set, 16 ML models were created, and their performance was evaluated using the validation set for metrics including sensitivity, specificity, accuracy, area under the receiver operating characteristic curve (AUC), and others. Results The training set consisted of 98 patients (mean age: 64.5 ± 10.5 years; 30 males), whereas the validation set consisted of 42 patients (mean age: 65.3 ± 9.78 years; 17 males). Hydronephrosis was the best independent influence factor (p < 0.05). The RF model had the best performance in predicting high‐grade UTUC, with AUC of 0.914 (95% Confidence Interval [95%CI] 0.852–0.977) and 0.903 (95%CI 0.809–0.997) in the training set and validation set, and accuracy of 0.878 and 0.857, respectively. Conclusions An ML model based on the RF algorithm exhibits excellent predictive performance, offering a non‐invasive approach for predicting preoperative high‐grade UTUC.


| INTRODUCTION
Upper urinary tract urothelial carcinoma (UTUC) is a relatively uncommon malignant neoplasm that arises in the renal pelvis and ureter, constituting merely 5%-10% of all cases of urothelial carcinoma. 1UTUC is predominantly observed in individuals aged between 70 and 90 years, exhibiting a twofold higher prevalence among males. 2Established prognostic factors such as tumor stage and pathological grade play significant roles in predicting outcomes for UTUC. 3,4Different tumor grades can show varying degrees of aggressiveness.Previous study has shown that pathological grade is an independent influencing factor of specific mortality in patients with UTUC. 5 There has a study also demonstrated a significant positive correlation (p < 0.01) between postoperative pathological grade and 5-year mortality in UTUC patients. 6Hence, precise preoperative identification of the tumor's pathological grade is essential to guide subsequent treatment strategies effectively. 7urrent diagnostic methods for UTUC encompass computed tomography urography (CTU), magnetic resonance imaging (MRI), urine cytology, and other techniques.The preferred radiological examination for diagnosing UTUC is CTU.A meta-analysis demonstrated that the sensitivity and specificity of CTU in UTUC diagnosis were 92% and 95%, respectively. 8Nevertheless, the determination of tumor pathological grade based on CTU examination remains challenging.Although ureteroscopy combined with ureteroscopic biopsy can assist in the clarification of disease diagnosis, challenges persist in specimen collection and difficulties arise from biopsy technology, impeding the accuracy of pathological grade and tumor stage. 9dditionally, it is crucial to note that ureteroscopy or biopsy presents a risk factor for intravesical recurrence of UTUC. 10 The aggressiveness of tumors is often linked to heterogeneity, a vital characteristic of malignant tumors. 11,12adiomics technology enables the assessment of tumor image heterogeneity acquired during routine clinical practice. 13,14By extracting and evaluating features from digital images, it facilitates the detection of subtle changes and heterogeneity that may be imperceptible to the naked eye, thereby offering a novel approach for tumor pathological grade diagnosis. 15The utilization of CT images for the extraction of radiomics features has been demonstrated to differentiate the pathological grade of renal cell carcinoma effectively. 16Despite its widespread application in various tumor diseases, its utilization remains limited in UTUC.
The objective of this study is to extract radiomics features from CTU images and develop a non-invasive radiomics-based machine learning (ML) model capable of predicting the pathology of high-grade UTUC.This model aims to assist in clinical disease diagnosis, provide clarity on pathological grade, and offer potential alternatives to ureteroscopy and biopsy techniques.

| Patients
The patients diagnosed with UTUC from the Second Affiliated Hospital of Kunming Medical University between January 2017 and August 2023 were retrospectively included as both the training and validation sets for developing ML model.Inclusion criteria included the following: (1) Patients diagnosed with UTUC through pathological examination following radical nephroureterectomy; (2) CTU images conducted within 1 month before the surgical procedure; (3) A minimum of three urine cytology examinations performed within 2 weeks preceding the surgery.Exclusion criteria included the following: (1) Coexistence with other tumors; (2) Preoperative neoadjuvant therapy; (3) Difficulty in obtaining CTU images; (4) Difficulty in segmenting the tumor lesions; (5) No CTU examination was performed before surgery.This retrospective study was approved by the Institutional Review Board of the Second Affiliated Hospital of Kunming Medical University, and informed consent was waived.
A total of 140 patients who met the inclusion and exclusion criteria were included in both the training set and validation set (Figure 1).The clinical baseline characteristics collected included the patient's T-stage, age, Conclusions: An ML model based on the RF algorithm exhibits excellent predictive performance, offering a non-invasive approach for predicting preoperative high-grade UTUC.

K E Y W O R D S
carcinoma, machine learning, pathological grade, radiomics, urinary tract gender, hydronephrosis, urine cytology, tumor location, and tumor type.

| Radiomics workflow
The CTU image acquisition process is described in detail in the Appendix S1, while the radiomics workflow is illustrated in Figure 2.

| Urine cytological examination and pathological grading
All patients underwent a minimum of three urine cytology examinations within 2 weeks before surgery.The procedure and results of cytology examinations are described in the Appendix S1.Urinary cytologic samples from UTUC patients in our study population were independently reexamined by two pathologists.The result of urine cytology was based on the Paris System classified as either negative (Paris 1-3,6) or positive (Paris 4,5).Pathological grade was based on the WHO 2004/2016 classification system, which included papillary urothelial neoplasms of low malignant potential (PUNLMP), low-grade urothelial carcinoma, and high-grade urothelial carcinoma.Inconsistent reports were resolved by a third senior pathologist.Pathological grades were classified as either low-grade (PUNLMP and low-grade urothelial carcinoma) or high-grade (high-grade urothelial carcinoma).

| Image segmentation
The 3D slicer software (version 5.0.3;www.slicer.org) was used to segment the two-dimensional region of interest (2D-ROI) with maximum cross-sectional area in the unenhanced period, medullary period, and excretion period of CTU image respectively.For each segmentation lesion, a senior radiologist with 10 years of experience in urogenital imaging guided manual segmentation to ensure accurate tumor margins.2D-ROI is located at a distance of 1 mm from the tumor margin.Segmentation of tumors on unenhanced period images was performed based on the excretion period images.
To assess the intraobserver reproducibility, 30 patients were randomly selected for segment ROI segmentation again after 1 month.Intraclass correlation coefficient (ICC) was used to evaluate the intraobserver consistency of feature extraction, in which ICC > 0.75 indicates good, and relevant features with ICC < 0.75 were excluded. 17,18he remaining features were normalized using Z-score, which involved subtracting their mean and dividing by their standard deviation (SD).The normalized features were subjected to a normality test, followed by Pearson correlation analysis for the normally distributed features and Spearman correlation analysis for the non-normally distributed features.The least absolute shrinkage and selection operator algorithm (LASSO) algorithm was used to select the remaining features subsequently.What's more, ML algorithms of support vector machine (SVM), random forest (RF), and eXtreme Gradient Boosting (XGboost) were employed to select features after normalization again to ensure the selection of optimal features.The univariate logistic regression (LR) and multivariate LR algorithm to identify the best independent influence factor among the clinical baseline characteristics.The radiomics score was calculated by a linear combination of features weighted by the noose coefficient.The formula was as follows: where, x i is the normalized value of each selected feature and Coef i is the coefficient corresponding to each selected feature. 19diomics score

| Construction of different feature dataset
The construction of four feature datasets was performed.First, the general-feature dataset was generated by selecting features with ICC > 0.75 and further screened using Pearson and Spearman correlation analysis and LASSO algorithm.Second, the clinical-feature dataset was established by adding the best independent influence factor (p < 0.05) to the general-feature dataset.Third, the clinical-radiomics feature dataset was established by combining the best independent influencing factors with the radiomics score calculated by the general-feature dataset.Last, the optimal clinical-radiomics feature dataset was constructed by integrating key features identified through SVM, RF, and XGboost algorithms with the clinicalradiomics feature dataset.

| Constructing diverse ML models
Four kinds of feature datasets were utilized to construct ML models employing LR, SVM, RF, and XGBoost algorithms.The SVM, RF, and XGBoost ML models underwent optimization through grid search.Subsequently, the training set and validation set were evaluated for the area under the receiver operating characteristic curve (AUC), sensitivity, specificity, accuracy, and 95% Confidence Interval (95% CI).

| Statistical analysis
All statistical analyses were performed with R Studio (version 4.2.2) and Python software (version 3.7.3).Data conforming to a normal distribution were presented as mean ± standard deviation (x ± s), and group comparisons were analyzed using the independent-sample t-test.Nonnormally distributed data were expressed as median (upper and lower quartiles), and group comparisons were assessed using the Mann-Whitney U-test.Categorical variables were presented as constituent ratios or rates (%) and compared between groups using the chi-squared test (χ 2 ) or Fisher's exact test.Statistical significance was defined as p < 0.05.

| Patient characteristics
In this study, a total of 140 patients with UTUC were included and randomly assigned to the training set (n = 98) and the validation set (n = 42) in a 7:3 ratio.Among them, there were 47 male patients and 93 female patients, with an average age of 64.7 ± 10.28 years.It consisted of 33 patients (23.6%) with low-grade pathology and 107 patients (76.4%) with high-grade pathology.In the training set, there were 23 patients (23.5%) with low-grade pathology and 75 patients (76.5%) with high-grade pathology; whereas in the validation set, 10 patients (23.8%) with low-grade pathology and 32 patients (76.2%) with highgrade pathology respectively.T-stage, hydronephrosis and urine cytology showed statistically significant differences between low-grade UTUC and high-grade UTUC in both the training set and the validation set respectively (p < 0.05).However, no statistically significant differences were observed in other clinical baseline characteristics among these two sets (all p > 0.05) (Table 1).

| Feature selection and construction of feature datasets
The total number of features extracted for each patient included 837 from unenhanced, medullary, and excretion period images, resulting in a cumulative count of 2511 features.By calculating ICC > 0.75, 1079 features were preserved (Appendix S1).After removing redundant features by Pearson and Spearman correlation analysis, 437 features were retained.The remaining 437 features were processed by LASSO algorithm (Figure 3.).Four features were selected and radiomics scores were calculated.In addition, due to the high latitude nature of features, 1079 features were screened again by SVM, RF, and XGBoost ML algorithms.Finally, the three most important features were included (Figure 4., Table 2.).In the clinical baseline characteristics, T-stage, hydronephrosis, and urine cytology were the independent influencing factors.Hydronephrosis was selected as the best independent influencing factor in this study (Table 3.).In the generalfeature dataset of training set and validation set, there were significant differences in radiomics scores between lowgrade and high-grade UTUC (both p < 0.05) (Figure 5.).Based on the screened features, four sets of feature datasets were created: general-feature dataset, clinicalfeature dataset, radiomics-feature dataset, and optimal clinical-radiomics feature dataset (Table 4.).

| Evaluation of four ML models' performance
The AUC (95% CI), accuracy, sensitivity, and specificity of four ML models across different feature datasets are comprehensively presented in Table 5. ROC for these models in both training and validation sets are illustrated in Figure 6.All 16 ML models demonstrate promising predictive efficacy in predicting high-grade UTUC.within the validation set.Among them, the RF ML model exhibits superior performance when utilizing optimal clinical-radiomics feature dataset, achieving remarkable AUC values of 0.914 (95% CI: 0.852-0.977)and 0.903 (95% CI: 0.809-0.997) in the training and validation sets respectively.

| DISCUSSION
In this study, we constructed 16 ML models based on the features extracted from the three periods of CTU image and selected the best-performance ML model to predict high-grade UTUC.In addition, we compared the performance of the four kinds of ML models by constructing different feature datasets.The results showed that  This model can be used as a non-invasive method for predicting preoperative pathological grade of UTUC.Currently, alongside invasive ureteroscopy and biopsy, radiological imaging plays an indispensable role in the diagnosis, staging, and follow-up of suspected upper urinary tract lesions. 20Among various radiological examination methods, CTU is the preferred imaging modality. 21spite its high diagnostic sensitivity and specificity in identifying space-occupying lesions of the upper urinary tract, it still lacks accuracy in tumor grading.Therefore, we aimed to develop and validate a non-invasive ML model capable of accurately determining the pathological grade of UTUC.In 2012, Lambin et al. introduced the concept of radiomics, which obtains digital information inside tumors through images. 22With the advancement of an increasing number of ML algorithms, radiomics research has gained widespread utilization in disease diagnosis and patient prognosis assessment. 23In a study similar to ours, the AUC value of the validation set was 0.860 (95% CI: The selection features through random forest (RF) and eXtreme Gradient Boosting (XGboost) machine learning (ML) algorithm.(A, B).The top 10 important features were filtered through RF and XGboost ML algorithms, and the importance of the features gradually increased from top to bottom of the image.0.742-0.979)and the diagnostic accuracy was 0.838 for the prediction of pathological grade of bladder cancer based on CT radiomics. 24Mammen et al. demonstrated significant differences in tumor texture between low-grade and high-grade UCC (p = 0.03) by applying CT texture analysis (CTTA) to distinguish the pathological grade of UTUC. 25 These studies have established the feasibility of CT-based radiomics for pathological grades.However, the extracted texture features were derived from both unenhanced and enhanced CT images, without fully exploiting the available information in CT imaging data.Furthermore, an increasing number of studies are employing diverse ML algorithms for feature selection and model construction. 26,27nfortunately, there is a paucity of studies investigating the application of this method for predicting the pathological grade of UTUC.In this study, we have developed a model that exhibits robust predictive ability.Additionally, we have explored various algorithms to construct feature datasets and identify the most representative radiomics datasets.
Due to the limited resolution of CT scans for soft tissue, it poses a challenge to visually distinguish tumor tissue components on CT images in clinical practice.Consequently, there is a scarcity of radiomics assessments utilizing CT for pathological tissue grading in UTUC.Conversely, owing to the superior high-resolution capabilities of MRI in detecting soft tissue abnormalities, MRI sequences exhibit commendable diagnostic performance when identifying pathological grade. 28However, MRI examination is associated with prolonged scanning time, contraindications for certain patients, and relatively high costs.Consequently, the utilization of MRI for evaluating UTUC may not be suitable for all individuals.In cases where patients present with initial symptoms of low back pain or hematuria without obvious causative factors, CTU examination is more commonly employed compared to MRI.A radiomics ML model based on multidetector CT (MDCT), developed by Moldovanu et al., demonstrated excellent performance in predicting the pathological grade of renal tumors, with an AUC of 0.99 (95% CI: 0.92-1.00) in the validation set. 29This study highlights the significant potential of radiomics features derived from multimodal CT imaging for the accurate prediction of tumor pathology.
In our study, among the four ML models constructed using Data1, the SVM model exhibits superior predictive performance in the validation set.Several research that used CTTA to create a variety of ML models for tumor pathology grade prediction have consistently shown that SVM performs better than other ML models, which is in line with our findings. 30,31Furthermore, the inclusion of clinical baseline characteristics significantly enhanced the model's predicted performance.In line with our findings, a study demonstrated a significant association between hydronephrosis and the diagnosis of ureteral tumors (p = 0.0307). 32Notably, our study encompassed a larger sample size, enhancing our results' reliability.
][35] Our results show that taking into account the clinicalradiomics feature dataset (Data 3) does not significantly improve the contribution of radiomics scores to model performance compared to the uncomputed The distribution of radiomics scores based on general-feature dataset in the training set and validation set.Radiomics scores were significant differences between low-grade and high-grade in the training set and validation set, respectively.* p < 0.01; ** p < 0.001.

T A B L E 4
The composition of diverse feature datasets.clinical-feature dataset (Data 2).In contrast to our findings, a prior study has demonstrated the potential of LG ML models based on MDCT computed radiomics scores in enhancing predictive performance for renal tumor pathological grade. 29However, our LR ML model did not yield similar outcomes.This discrepancy may be attributed to research subjects, divergent approaches to segment ROI, and distinct characteristics of the model.

Dataset type Series Construction of dataset
To investigate the feature importance order of the optimal clinical-radiomics dataset in the RF model, we computed SHAP values to determine the ranking of features (Figure 7). 36Feature of "mean" derived from the first order primarily represents the average grayscale intensity within the ROI.In the excretion period of CT image, the average grayscale intensity is generally higher compared to that in the non-enhanced period due to tissue absorption of contrast agents. 11Zhang

F I G U R E 6
The performance of ML models to predict high-grade UTUC.(A-D) ROC curves of four kinds of ML models for training set and validation set in different feature datasets (LR, SVM, XGBoost, RF).ML, machine learning; LR, logistic regression; RF, random forest; SVM, support vector machine; XGBoost, eXtreme Gradient Boosting.et al. employed CTTA to discriminate between lowgrade and high-grade urothelial carcinomas.The study revealed significant differences in the feature of "mean" on unenhanced and contrast-enhanced CT images for low-grade urothelial carcinomas (p < 0.001), with unenhanced images exhibiting a lower feature of "mean". 37he dose of contrast agent may have an effect on the gray value, however, this effect is not yet clear and more research is needed.The texture feature of GlCM assesses the textural characteristics of an image by analyzing the spatial alignment statistics of pixel intensity. 38merous studies have substantiated the significance of GLCM features in assessing tumor heterogeneity.For instance, GLCM correlation has been employed to predict renal cell carcinoma with tumor thrombus, 39 while GLCM "clustershade" has been utilized for prognosticating pancreatic neuroendocrine tumor pathological grade and gastrointestinal stromal tumor prognosis. 40,41Inverse difference moment normalized" (idmn) is used to evaluate the local homogeneity of the image.In this study, we observed a potential association between GLCM idmn and the pathological grade of UTUC.In patients with breast cancer, a substantial association has been shown between GLCM idmn and one-year relapse-free survival.42 Moreover, it was discovered that GLRLM may be used to predict acute pulmonary thromboembolism, 43 whereas NGTDM is linked to the prediction of pathological grade in bladder cancer.44 The aforementioned studies have demonstrated that radiomics features not only enable the expression of tumor heterogeneity, which is challenging to detect visually, through digital information but also exhibit potential for evaluating tumor prognosis.12 In radiomics research, acquisition of high-quality and artifact-free radiographic images is a prerequisite, while variations in image acquisition mode, matrix size, and algorithm construction may impact the accuracy or generalizability of radiomics features or models.45 With the continuous advancement of research, an increasing number of representative features are gradually demonstrating a positive correlation with tumor heterogeneity, thereby highlighting the potential for detecting highly correlated features as a predictive method for tumor pathological grading.
There are several limitations in our study.First, due to the retrospective design, potential selection bias may exist; therefore, further studies with a larger sample size and more comprehensive research are warranted.Second, this study's modeling data utilized originated solely from a single center investigation and has not undergone external validation.To ascertain the model's generalizability, external validation must be done gradually.Third, the sample size of the study is relatively small, necessitating additional data samples to enhance classification performance of the model.Lastly, potential influences on results could arise from factors such as image acquisition mode, reconstruction parameters, tumor segmentation method, feature selection, and so on.In conclusion, the CTU-based ML model holds great promise in predicting the pathological grade of UTUC.As a preoperative evaluation tool for UTUC's pathological grade, the radiomics model can serve as an invaluable supplementary method to ureteroscopic biopsy.

F I G U R E 1
Flowchart of the study.

F I G U R E 3
Feature selection using the least absolute shrinkage and selection operator algorithm (LASSO) algorithm.(A) Figure of binomial deviation versus logarithm (λ).5-fold cross validation to confirm the optimal tuning parameter (λ), λ min = 0.123, log (λ) = −2.096.(B) Figure of the lasso coefficients versus logarithm (λ).Four features with the lowest lasso coefficient were selected at log (λ) = −2.096.all the ML models had good classification performance, and the ML model based on RF algorithm had the best prediction performance, with the highest AUC values in the training set and validation set respectively 0.914 (95% CI: 0.852-0.977)and 0.903 (95% CI: 0.809-0.997).

F I G U R E 7
Figure of Shap value.Rank the importance of each feature in the output of the random forest (RF) machine learning(ML) model.Different colors (red and blue) indicate different degrees of influence on the model output.
Patient clinical characteristics.
T A B L E 1Note: p-value, χ 2 or Fisher's exact test.
Features selection based on four different algorithms.Univariate and multivariate LR analysis algorithm of the clinical baseline characteristics.
T A B L E 2 Performance of different ML models in diverse feature datasets.